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DESCRIPTION 
Method, Apparatus and Computer Program for Calculating and 
Adjusting the Perceived Loudness of an Audio Signal. 

Technical Field 

Hie present invention is related to loudness measurements of audio signals and 
to apparatuses, methods, and computer programs for controlling the loudness of 
audio signals in response to such measurements. 

Background Art 

Loudness is a subjectively perceived attribute of auditoiy sensation by which 
sound can be ordered on a scale extending from quiet to loud. Because loudness is a 
sensation perceived by a listener, it is not suited to direct physical measurement, 
therefore making it difficult to quantify. In addition, due to the perceptual 
component of loudness, different listeners with "normal" hearing may have different 
perceptions of the same sound. The only way to reduce the variations introduced by 
individual perception and to 'arrive at a general measure of the loudness of audio 
material is to assemble a group of listeners and derive a loudness figure, or ranking, 
statistically. This is clearly an impractical approach for standard, day-to-day, 
loudness measurements. 

There have been many attempts to develop a satisfactory objective method of 
measuring loudness. Fletcher and Munson determined in 1933 that human hearing is 
less sensitive at low and high frequencies than at middle (or voice) frequencies. 
They also found that the relative change in sensitivity decreased as the level of the 
sound increased. An early loudness meter consisted of a microphone, amplifier, 
meter and a combination of filters designed to roughly mimic the frequency response 
of heaiing at low, medium and high sound levels. 

Even though such devices provided a measurement of the loudness of a single, 
constant level, isolated tone, measurements of more complex sounds did not match 
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the subjective impressions of loudness veiy well. Sound level meters of this type 
have been standardized but are only used for specific tasks, such as the monitoring 

and control of industrial noise. 

In the early 1950s, Zwicker and Stevens, among others, extended the work of 
5 Fletcher and Munson in developing a more realistic model of the loudness perception 
process. Stevens published a method for the "Calculation of the Loudness of 
Complex Noise" in the Journal of the Acoustical Society of America in 1956, and 
Zwicker published his "Psychological and Methodical Basis of Loudness" article in 
Acoustica in 1958. In 1959 Zwicker published a graphical procedure for loudness 

10 calculation, as well as several similar articles shortly after. Hie Stevens and Zwicker 
methods were standardized as ISO 532, parts A and B (respectively). Both methods 
incorporate standard psychoacoustic phenomena such as critical banding, frequency 
masking and specific loudness. The methods are based on the division of complex 
sounds into components that fall into "critical bands" of frequencies, allowing the 

15 possibility of some signal components to mask others, and the addition of the specific 
loudness in each critical band to arrive at the total loudness of the sound. 

Recent research, as evidenced by the Australian Broadcasting Authority's 
(ABA) "Investigation into Loudness of Advertisements" (July 2002), has shown that 
many advertisements (and some programs) are perceived to be too loud in relation to 

20 the other programs, and therefore are veiy annoying to the listeners. The ABA's 

investigation is only the most recent attempt to address a problem that has existed for 
years across virtually all broadcast material and countries. These results show that 
audience annoyance due to inconsistent loudness across program material could be 
reduced, or eliminated, if reliable, consistent measurements of program loudness 

25 could be made and used to reduce the annoying loudness variations. 

The Bark scale is a unit of measurement used in the concept of critical bands. 
The critical-band scale is based on the fact that human hearing analyses a broad 
spectrum into parts that correspond to smaller critical sub-bands. Adding one critical 
band to the next in such a way that the upper limit of the lower critical band is the 

30 lower limit of the next higher critical band, leads to the scale of critical-band rate. If 
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the critical bands are added up this way, then a certain frequency corresponds to each 
crossing point. The first critical band spans the range from 0 to 100 Hz, the second 
from 100 Hz to 200 Hz, the third from 200 Hz to 300 Hz and so on up to 500 Hz 
where the frequency range of each critical band increases. The audible frequency 
5 range of 0 to 16 kHz can be subdivided into 24 abutting critical bands, which 

increase in bandwidth with increasing frequency. The critical bands are numbered 
from 0 to 24 and have the unit "Bark", defining the Bark scale. The relation between 
critical-band rate and frequency is important for understanding many characteristics 
of die human ear. See, for example, Psychoacoustics - Facts and Models by E. 

10 Zwicker and H. Fasti, Springer- Verlag, Berlin, 1990. 

Hie Equivalent Rectangular Bandwidth (ERB) scale is a way of measuring 
frequency for human hearing that is similar to the Bark scale. Developed by Moore, 
Glasberg and Baer, it is a refinement of Zwicker' s loudness work. See Moore, 
Glasberg and Baer (B. C. J. Moore, B. Glasberg, T. Baer, "A Model for the 

15 Prediction of Thresholds, Loudness, and Partial Loudness," Journal of the Audio 

Engineering Society, Vol. 45, No. 4, April 1997, pp. 224-240). The measurement of 
critical bands below 500 Hz is difficult because at such low frequencies, the 
efficiency and sensitivity of the human auditoiy system diminishes rapidly. 
Improved measurements of the auditory-filter bandwidth have lead to the ERB-rate 

20 scale. Such measurements used notched-noise maskers to measure the auditory filter 
bandwidth. In general, for the ERB scale the auditory-filter bandwidth (expressed in 
units of ERB) is smaller than on the Bark scale. The difference becomes larger for 
lower frequencies. 

The frequency selectivity of the human hearing system can be approximated 
25 by subdividing the intensity of sound into parts that fall into critical bands. Such an 
approximation leads to the notion of critical band intensities. If instead of an 
infinitely steep slope of the hypothetical critical band filters, the actual slope 
produced in the human hearing system is considered, then such a procedure leads to 
an intermediate value of intensity called excitation. Mostly, such values are not used 
30 as lineai* values but as logarithmic values similar to sound pressure level. Hie 
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critical-band and excitation levels are the corresponding values that play an important 
role in many models as intermediate values. (See Psychoacoustics - Facts and 
Models, supra). 

Loudness level may be measured in units of "phon". One phon is defined as 
5 the perceived loudness of a 1 kHz pin e sine wave played at 1 dB sound pressure level 

(SPL), which corresponds to a root mean square pressure of 2x1 (T 5 Pascals. N Phon 
1 is the perceived loudness of a 1 kHz tone played at N dB SPL. Using this definition 
in comparing the loudness of tones at frequencies other than 1 kHz with a tone at 1 
kHz, a contour of equal loudness can be determined for a given level of phon. FIG. 7 
10 shows equal loudness level contours for frequencies between 20 Hz and 12.5 kHz, 
and for phon levels between 4.2 phon (considered to be the threshold of hearing) and 
120 phon (IS0226: 1987 (E), "Acoustics - Normal Equal Loudness Level 

Contours"). 

Loudness level may also be measured in units of "sone". There is a one-to-one 
15 mapping between phon units and sone units, as indicated in FIG. 7. One sone is 
defined as the loudness of a 40 dB (SPL) 1 kHz pure sine wave and is equivalent to 
40 phon. The units of sone are such that a twofold increase in sone corresponds to a 
doubling of perceived loudness. For example, 4 sone is perceived as twice as loud as 
2 sone. Thus, expressing loudness levels in sone is more informative. 
20 Because sone is a measure of loudness of an audio signal, specific loudness is 

simply loudness per unit frequency. Thus when using the bark fr equency scale, 
specific loudness has units of sone per bark and likewise when using the ERB 
frequency scale, the units are sone per ERB. 

Throughout the remainder of this document, terms such as "filter" or 
25 "filterbank" are used herein to include essentially any form of recursive and non- 
recursive filtering such as IIR filters or transforms, and "filtered" information is the 
result of applying such filters. Embodiments described below employ filterbanks 
implemented by IIR filters and by transforms. 
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Disclosure of the Invention 
According to an aspect of the present invention, a method for processing an 
audio signal includes producing, hi response to the audio signal, an excitation signal, 
and calculating the perceptual loudness of the audio signal in response to the 
5 excitation signal and a measure of characteristics of the audio signal, wherein the 
calculating selects, from a group of two or more specific loudness model functions, 
one or a combination of two or more of the specific loudness model functions, the 
selection of which is controlled by the measure of characteristics of the input audio 
signal. 

10 According to another aspect of the present invention, a method for processing 

an audio signal includes producing, in response to the audio signal, an excitation 
signal, and calculating, in response at least to the excitation signal, a gain value G[t], 
which, if applied to the audio signal, would result in a perceived loudness 
substantially the same as a reference loudness, the calculating including an iterative 

15 processing loop that includes at least one non-linear process. 

According to yet another aspect of the present invention, a metliod for 
processing a plurality of audio signals includes a plurality of processes, each 
receiving a respective one of the audio signals, wherein each process produces, in 
response to the respective audio signal, an excitation signal, calculates, in response at 

20 least to the excitation signal, a gain value G[t], which, if applied to the audio signal, 
would result in a perceived loudness substantially the same as a reference loudness, 
the calculating including an iterative processing loop tiiat includes at least one non- 
linear process, and controls the amplitude of the respective audio signal with the gain 
G[t] so that the resulting perceived loudness of the respective audio signal is 

25 substantially the same as die reference loudness, and applying the same reference 
loudness to each of the plurality of processes. 

In an embodiment that employs aspects of the invention, a method or device 
for signal processing receives an input audio signal. The signal is linearly filtered by 
a filter or filter function that simulates the characteristics of the outer and middle 

30 human ear and a filterbank or filterbank function that divides the filtered signal into 
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frequency bands that simulate the excitation pattern generated along the basilar 
membrane of the inner ear. For each frequency band, the specific loudness is 
calculated using one or more specific loudness functions or models, the selection of 
which is controlled by properties or features extracted from the input audio signal. 
5 The specific loudness for each frequency band is combined into a loudness measure, 
representative of the wideband input audio signal. A single value of the loudness 
measure may be calculated for some finite time range of the input signal, or the 
loudness measure may be repetitively calculated on time intervals or blocks of the 
input audio signal. 

10 In another embodiment that employs aspects of the invention, a method or 

device for signal processing receives an input audio signal. The signal is linearly 
filtered by a filter or filter function that simulates the characteristics of the outer and 
middle human ear and a filterbank or filterbank function that divides the filtered 
signal into frequency bands that simulate the excitation pattern generated along the 

15 basilar membrane of the inner ear. For each frequency band, the specific loudness is 
calculated using one or more specific loudness functions or models; the selection of 
which is controlled by properties or features extracted from the input audio signal. 
The specific loudness for each frequency band is combined into a loudness measure; 
representative of the wideband input audio signal. The loudness measure is 

20 compared with a reference loudness value and the difference is used to scale or gain 
adjust the frequency-banded signals previously input to the specific loudness 
calculation. The specific loudness calculation, loudness calculation and reference 
comparison are repeated until the loudness and the reference loudness value are 
substantially equivalent. Thus, the gain applied to the frequency banded signals 

25 represents the gain which, when applied to the input audio signal results in the 
perceived loudness of the input audio signal being fcssentially equivalent to the 
reference loudness. A single value of the loudness measure may be calculated for 
some finite range of the input signal, or the loudness measure may be repetitively 
calculated on tune intervals or blocks of the input audio signal. A recursive 
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application of gain is preferred due to the non-linear nature of perceived loudness as 
well as the structure of the loudness measurement process. 

The various aspects of the present invention and its preferred embodiments 
may be better understood by referring to die following disclosure and the 
5 accompanying drawings in which the like reference numerals refer to the like 

elements in the several figures. The drawings, which illustrate various devices or 
processes, show major elements that are helpful in understanding the present 
invention. For the sake of clarity, the dr awings omit many other features that may be 
important in practical embodiments and are well known to those of ordinary skill in 
10 the ait but ar e not important to understanding the concepts of the present invention. 
Hie signal processing for practicing the present invention may be accomplished in a 
wide variety of ways including programs executed by microprocessors, digital signal 
processors, logic arrays and other forms of computing circuitry. 

1 5 Description of th e Drawings 

FIG. 1 is a schematic functional block diagram of an embodiment of an aspect 
of the present invention. 

FIG. 2 is a schematic functional block diagram of an embodiment of a further 

aspect of die present invention. 
20 FIG. 3 is a schematic functional block diagram of an embodiment of yet a 

further aspect of the present invention. 

FIG. 4 is an idealized characteristic response of a linear filter P(z) suitable as a 
transmission filter in an embodiment of the present invention in which the vertical 
axis is attenuation in decibels (dB) and the horizontal axis is a logarithmic base 10 
25 frequency in Hertz (Hz). 

FIG. 5 shows the relationship between the ERB frequency scale (vertical axis) 
and frequency in Hertz (horizontal axis). 

FIG. 6 shows a set idealized auditoiy filter characteristic responses that 
approximate critical banding on the ERB scale. The horizontal scale is frequency in 
30 Hertz and the vertical scale is level hi decibels. 
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FIG. 7 shows the equal loudness contours of IS0266. The horizontal scale is 
frequency in Hertz (logarithmic base 10 scale) and the vertical scale is sound 
pressure level in decibels. 

FIG. 8 shows the equal loudness contours of IS0266 normalized by the 
5 transmission filter P(z) . The horizontal scale is frequency in Hertz (logarithmic base 
10 scale) and the vertical scale is sound pressure level in decibels. 

FIG. 9 (solid lines) shows plots of loudness for both unifonn-exciting noise 
and a 1 kHz tone in which solid lines are in accordance with an embodiment of the 
present invention in which parameters are chosen to match experimental data 
10 according to Zwicker (squares and circles). The vertical scale is loudness in sone 
(logarithmic base 10) and the horizontal scale is sound pressure level in decibels. 

FIG. 10 is a schematic functional block diagram of an embodiment of a further 
aspect of the present invention. 

FIG. 11 is a schematic functional block diagram of an embodiment of yet a 
1 5 further aspect of the present invention. 

FIG. 12 is a schematic functional block diagram of an embodiment of another 
aspect of the present invention. 

FIG. 13 is a schematic functional block diagram of an embodiment of another 
aspect of the present invention. 

20 

Best Modes for Carrying Out the Invention 
As described in greater detail below, an embodiment of a first aspect of the 
present invention, shown in FIG. 1, includes a specific loudness controller or 
controller function ("Specific Loudness Control") 124 that analyzes and derives 

25 characteristics of an input audio signal. Hie audio characteristics are used to control 
parameters in a specific loudness converter or converter function ("Specific 
Loudness") 120. By adjusting the specific loudness parameters using signal 
characteristics, the objective loudness measurement technique of the present 
invention may be matched more closely to subjective loudness results produced by 

30 statistically measuring loudness using multiple human listeners. The use of signal 
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characteristics to control loudness parameters may also reduce the occurrence of 
incorrect measurements that result in signal loudness deemed annoying to listeners. 

As described in greater detail below, an embodiment of a second aspect of die 
present invention, shown in FIG. 2, adds a gain device or function ("Iterative Gain 
5 Update") 233, the puipose of which is to adjust iteratively die gain of the time- 
averaged excitation signal derived from the input audio signal until the associated 
loudness at 223 in FIG. 2 matches a desired reference loudness at 230 in FIG. 2. 
Because the objective measurement of perceived loudness involves an inherently 
non-linear process, an iterative loop may be advantageously employed to determine 

10 an appropriate gain to match the loudness of the input audio signal to a desired 
loudness level. However, an iterative gain loop surrounding an entire loudness 
measurement system, such that the gain adjustment is applied to the original input 
audio signal for each loudness iteration, would be expensive to implement due to the 
temporal integration required to generate an accurate measure of long-term loudness. 

15 In general, hi such an arrangement, the temporal integration requires recomputation 
for each change of gain hi the iteration. However, as is explained further below, hi 
the aspects of the invention shown hi the embodiments of FIG. 2 and also FIGS. 3, 
and 10-12, the temporal integration may be performed in linear processing paths that 
precede and/or follow the non-linear process that forms part of the iterative gain 

20 loop. Linear processing padis need not form a part of the iteration loop. Thus, for 
example hi the embodiment of FIG. 2, the loudness measurement path fr om input 
20 1 to a specific loudness converter or converter function ("Specific Loudness") 220, 
may include the temporal integration in tune averaging function ("Time Averaging") 
206, and is linear. Consequently, the gain iterations need only be applied to a 

25 reduced set of loudness measurement devices or functions and need not include any 
temporal integration. In die embodiment of FIG. 2 the transmission filter or 
transmission filter function ("Transmission Filter") 202, the filter bank or filter bank 
function ("Filterbank") 204, the time averager or time averaging function ("Tune 
Averaging") 206 and the specific loudness controller or specific loudness control 

30 function ("Specific Loudness Control") 224 are not part of the iterative loop, 
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peiinitting iterative gain control to be implemented in efficient and accurate real-time 



systems. 



Referring again to FIG. 1, a functional block diagram of an embodiment of a 
loudness measurer or loudness measuring process 1 00 according to a first aspect of 
the present invention is shown. An audio signal for which a loudness measurement is 
to be determined is applied to an input 101 of the loudness measurer or loudness 
measuring process 100. The input is applied to two paths - a first (main) path that 
calculates specific loudness in each of a plurality of frequency bands that simulate 
those of the excitation pattern generated along the basilar membrane of the inner ear 
and a second (side) path having a specific loudness controller that selects the specific 
loudness functions or models employed in the main path. 

In a preferred embodiment, processing of the audio is performed in the digital 
domain. Accordingly, the audio input signal is denoted by the discrete time sequence 

x[n] which has been sampled from the audio source at some sampling frequency*^ 5 . 
It is assumed that the sequence x[n] has been appropriately scaled so that the mis 
power of x[n] in decibels given by 



is equal to the sound pressure level in dB at which the audio is being auditioned by a 
human listener. In addition, the audio signal is assumed to be monophonic for 
simplicity of exposition. The embodiment may, however, be adapted to multi- 
channel audio in a manner described later. 



In the main path, the audio input signal is applied to a tr ansmission filter or 
transmission filter function ("Transmission Filter") 102, the output of which is a 
filtered version of the audio signal. Transmission Filter 102 simulates the effect of 
the transmission of audio through the outer and middle ear with the application of a 
linear filter P(z). As shown in FIG. 4, one suitable magnitude frequency response of 
P(z) is unity below 1 kHz, and, above 1 kHz, the response follows the inverse of the 
threshold of heat ing as specified in the IS0226 standard, with the threshold 




Transmission Filter 102 



1 

< 
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normalized to equal unity at 1 kHz. By applying a transmission filter, the audio that 
is processed by the loudness measurement process more closely resembles the audio 
that is perceived in human hearing, thereby improving the objective loudness 
measure. Thus, the output of Transmission Filter 102 is a frequency-dependently 
5 scaled version of the time-domain input audio samples xf nj. 

Filterbank 104 

The filtered audio signal is applied to a filterbank or filterbank function 
("Filterbank") 104 (FIG. 1). Filterbank 104 is designed to simulate the excitation 
partem generated along the basilar membrane of the inner ear. The Filterbank 104 

10 may include a set of linear filters whose bandwidth and spacing are constant on the 
Equivalent Rectangular Bandwidth (ERB) frequency scale, as defined by Moore, 
Glasberg and Baer (B. C. J. Moore, B. Glasberg, T. Baer, "A Model for the 
Prediction of Thresholds, Loudness, and Partial Loudness," supra). 

Aldiough the ERB frequency scale more closely matches human perception 

15 and shows unproved performance in producing objective loudness measurements that 
match subjective loudness results, the Bark frequency scale may be employed with 

reduced performance. 

For a center frequency /in hertz, the width of one ERB band in hertz may be 

approximated as: 

20 £RB(/) = 24.7(4.37//1000 + l) (1) 

From this relation a warped frequency scale is defined such that at any point 
along the waiped scale, the corresponding ERB in units of the warped scale is equal 
to one. The function for converting from linear frequency in hertz to this ERB 
frequency scale is obtained by integrating the reciprocal of Equation 1: 

25 MMHO - J 24 , 7(4 .37)/1000 + l) <y ^■ "log, 0 (4.37//1000 + l) (2a) 

It is also useful to express the transformation from the ERB scale back to the 
linear fr equency scale by solving Equation 2 a for/ 

ERBToHzie) = / = l^io^"^, (2b) 

4.37 
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where e is in units of the ERB scale. FIG. 5 shows the relationship between the ERB 
scale and frequency in hertz. 

The response of the auditoiy filters for the Filterbank 104 may be 
characterized and implemented using standard IIR filters. More specifically, the 
5 individual auditoiy filters at center frequency f c in hertz that are implemented in the 

Filterbank 104 may be defined by the twelfth order IIR transfer function: 

H f OQ - G (1 )(1 - 2 ^ cos(2 ^ 1 + r fp- , (3) 

f ' KJ (l-2r A cos(2nf A /f t )z- l +r A z- 2 y 

where 



f A =4fF+Bl , (4a) 



10 r A =e-™' ,f - 



A 5 



(4b) 

B w = mm{l.55ERB(f c ),0.5f c , (4c) 
f B = min {ERBscale~ i (ERBscale(f c ) + 5 .25), f s 1 2} , (4d) 

r B = 0.985 , (4e) 
/, is the sampling frequency in hertz, and G is a normalizing factor to ensure that 
15 each filter has unity gain at the peak in its frequency response; chosen such that 

maxj/f /e (e^)|}=l. (4f) 

The Filterbank 1 04 may include M such auditoiy filters, referred to as bands, 
at center frequencies f c [l].. f c [M] spaced uniformly along the ERB scale. More 

specifically, 

20 /Jl] = / roin (5a) 

J c [m] = fcV"-^i + ERBToHz(HzToERB(f c [m-l)) + A) m=2..M (5b) 

/AM] < , (5c) 

where A is the desired ERB spacing of the Filterbank 104, and where / min and /„, are 

the desired miniinuin and maximum center frequencies, respectively. One may 
25 J choose A = 1 , and taking into account the frequency range over which the human ear 
is sensitive, one may set / min = SQHz and / max = 20,00(Xfife . With such parameters, for 

example, application of Equations 6a-c yields il<f=40 auditoiy filters. The magnitudes 
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of such M auditory filters, which approximate critical banding on the ERB scale, are 
shown in FIG. 6. 

Alternatively, the filtering operations may be adequately approximated using a 
finite length Discrete Fourier Transform, commonly referred to as the Short-Time 
5 Discrete Fourier Transform (STDFT), because an implementation running the filters 
at the sampling rate of the audio signal, referred to as a full-rate implementation, is 
believed to provide more temporal resolution than is necessary for accurate loudness 
measurements. By using the STDFT instead of a full-rate implementation, an 
improvement in efficiency and reduction in computational complexity may be 
10 achieved. 

The STDFT of input audio signal* [77] is defined as: 

//_J .Ink 

X[k, t) = £ w[n]x[n + tT]i J ~^ , (6) 

where k is the frequency index, / is the time block index, N is the DFT size, T is the 
hop size, and w[n] is a length N window normalized so that 

15 ^ w 2 [n]=l (7) 

«=0 

Note that the variable / in Equation 6 is a discrete index representing the time 
block of the STDFT as opposed to a measure of time in seconds. Each increment in t 
represents a hop of T samples along the signal x[ii]. Subsequent references to the 
index t assume this definition. While different parameter settings and window shapes 
20 may be used depending upon the details of implementation, for f s = 44100ife, 

choosing N = 4096, T = 2048 , and having if [n]„be a Hanning window produces 
excellent results. The STDFT described above may be more efficient using the Fast 
Fourier Transform (FFT). 

In order to compute the loudness of the input audio signal, a measure of the 
25 audio signals' energy in each filter of die Filterbank 104 is needed. The short-time 
energy output of each filter in Filterbank 104 may be approximated through 
multiplication of filter responses in the frequency domain with the power spectrum of 
the input signal: 
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E ^=T7Z H f n**)\ftKtf , ( 8 ) 

N t^> m 

where m is the band number, t is the block number, and P is the transmission filter. It 
should be noted that forms for the magnitude response of th e auditory filters other 
than that specified in Equation 3 may be used in Equation 8 to achieve similar 

4 

results. For example, Moore and Glasberg propose a filter shape described by an 
exponential function that performs similarly to Equation 3 . In addition, with a slight 
reduction in performance, one may approximate each filter as a "brick-wall" band 
pass with a bandwidth of one ERB, and as a further approximation, the transmission 
filter P may be pulled out of the summation, hi this case, Equation 8 simplifies to 

mm.il = Jl|p(6^*^* )| 2 Z\X[k 9 tf (9a) 

k x = round {ERBToHz{HzToERB{f c [/«]) -V2)Nlf s ) (9b) 

k 2 = round(ERBToHz(HzToERB(f c [m]) + l/2)N/f s ) (9c) 

Thus, the excitation output of Filterbank 104 is a frequency domain representation of 
energy E in respective ERB bands m per time period t. 

15 Multi-Channel 

For the case when the input audio signal is of a multi-channel format to be 
auditioned over multiple loudspeakers, one for each channel, the excitation for each 
individual channel may first be computed as described above. In order to 
subsequently compute the perceived loudness of all channels combined, the 

20 individual excitations may be summed together into a single excitation to 

approximate the excitation reaching the ears of a listener. All subsequent processing 
is then performed on this single, summed excitation. 

Time Averaging 106 
Research hi psychoacoustics and subjective loudness tests suggest that when 

25 comparing the loudness between various audio signals listeners perform some type of 
temporal integration of short-term or "instantaneous" signal loudness to arrive at a 
value of long-term perceived loudness for use in the comparison. When building a 
model of loudness perception, others have suggested that this temporal integration be 
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perfonned after the excitation has been transformed non-linearly into specific 
loudness. However, the present inventors have determined that this temporal 
integration may be adequately modeled using linear smoothing on the excitation 
before it is transformed into specific loudness. By performing the smoothing prior to 
5 computation of specific loudness, according to an aspect of the present invention, a 
significant advantage is realized when computing the gain that needs to be applied to 
a signal in order to adjust its measured loudness in a prescribed manner. As 
explained further below, the gain may be calculated by using an iterative loop that 
not only excludes the excitation calculation but preferably excludes such temporal 

10 integration, hi this manner, the iteration loop may generate the gain through 

computations that depend only on the current time frame for which the gain is being 
computed as opposed to computations that depend on the entire time interval of 
temporal integration. The result is a savings in both processing time and memoiy. 
Embodiments that calculate a gain using an iterative loop include those described 

15 below in connection with FIGS. 2, 3, and 10-12. 

Returning to the description of FIG. 1, linear smoothing of the excitation may 
be implemented in various ways. For example, smoothing may be perfonned 
recursively using a time averaging device or function ("Time Averaging") 106 
employing the following equations : 

20 E[m 9 t] = E[m 9 t-1]+ * (E[m 9 t]-E[m 9 t-1]) (10a) 

a[m 9 t] 

a[m,t] = A m <x[m,f-l] + l, (10b) 
where the initial conditions are £[772,-1] = 0 and a[m-l] = 0. A unique feature of the 
smoothing filter is that by varying the smoothing parameter X m , the smoothed energy 
E[m 9 t] may vary from the tine time average of E[m 9 1] to a fading memoiy average of 
25 E[?n y t] . If X m = 1 then from (10b) it may be seen that ff[m 9 t] = t , andE[m 9 t] is then ' 
equal to the true time average of E[m 9 1] for time blocks 0 up to /. If 0 < X m < 1 then 
ff[m,t] -> 1/(1- X m ) as t 00 and E[m 9 t\ is simply the result of applying a one pole 
smoother to E[m 9 t] . For the application where a single number describing the long- 
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term loudness of a finite length audio segment is desired, one may set X m = 1 for all 
m. For a real-time application where one would like to track the time-varying long- 
term loudness of a continuous audio stream in real-time, one may seto < A m < 1 and 

set X m to the same value for all m. 
5 In computing the time-average of E[m,t] , it may be desirable to omit short- 

time segments that are considered "too quiet" and do not contribute to the perceived 
loudness. To achieve this, a second thresholded smoother may be run in parallel with 
the smoother in Equation 10. This second smoother holds its current value if E[mj] 

is small relative to E[m,t] : 
10 E M = ZmmA > 10 » gftMp la) 

E[m,t - 1], otherwise 

a[m,t] = M M olm,r-l] + l, > 10'° ? (nb) 

o^m, t — 1], othei-wise 

where is tlie relative thi eshold specified in decibels. Although it is not critical to 
the invention, a value of tdB = -24 has been found to produce good results. If there is 
no second smoother running in parallel, then E[m,t] = E[mj] . 

15 Specific Loudness 120 

It remains for the banded time-averaged excitation energy E[m 9 t] to be 
converted into a single measure of loudness in perceptual units, sone in this case. In 
the specific loudness converter or conversion function ("Specific Loudness") 120, 
each band of the excitation is converted into a value of specific loudness, which is 
20 measured in sone per ERB. In the loudness combiner or loudness combining 
function ("Loudness") 122, the values of specific loudness may be integrated or 
summed across bands to produce the total perceptual loudness. 

Specific Loudness Control 124 / Specific Loudness 120 

Multiple Models 

25 In one aspect, the present invention utilizes a plurality of models in block 120 

for converting banded excitation to banded specific loudness. Control information 
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derived from the input audio signal via Specific Loudness Control 124 in the side 
path selects a model or controls the degree to which a model contributes to the 
specific loudness. In block 124, certain features or characteristics that are useful for 
selecting one or more specific loudness models from those available are extracted 
from the audio. Control signals that indicate which model, or combinations of 
models, should be used are generated fr om the extracted features or characteristics. 
Where it may be desirable to use more than one model, the control information may 
also indicate how such models should be combined. 

For example, the per band specific loudness N'[m,t] may be expressed as a 

linear combination of the per band specific loudness for each model N' q [m 9 t\ as: 



where Q indicates the total number of models and the control information a q [m 9 t] 

represents the weighting or contribution of each model. The sum of the weightings 
may or may not equal one, depending on the models being used. 

Although the invention is not limited to them, two models have been found to 
give accurate results. One model performs best when the audio signal is 
characterized as narrowband, and the other performs best when the audio signal is 

characterized as wideband. 

Initially, in computing specific loudness, the excitation level in each band of 
E[mj] may be transformed to an equivalent excitation level at 1 kHz as specified by 
the equal loudness contours of IS0266 (FIG. 7) normalized by the transmission filter 
P{z) (FIG. 8): 



where (E 9 f) is a function that generates the level at 1 kHz, which is equally loud 
to level E at frequency/ hi practice, Z, Wf (E 9 f) is implemented as an interpolation of 
a look-up table of the equal loudness contours, normalized by the transmission filter. 
Transformation to equivalent levels at 1 kHz simplifies the following specific 
loudness calculation. 




(12) 



(13) 



onn.i i * inn.i a -> i 
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Next, the specific loudness in each band may be computed as: 

N'[m,t] = a[m, t]N' NB [m, t] + (1 - a[m, t\)N' m [m, r] , (1 4) 

where N'^imj] and N' tVB [m,q are specific loudness values based on a narrowband and 
wideband signal model, respectively. The value a[m, f] is an interpolation factor 
lying between 0 and 1 that is computed from the audio signal, the details of which are 
described below. 

The narrowband and wideband specific loudness values N' NB [m,f\ and N' m [m,t] 
may be estimated from the banded excitation using the exponential functions: 

TQxtHz 



N'[m,t] = < 



G 



NB 



'\kHi \ 

TQ 



10 



(15a) 



N' [m, t] = i 



V 



I TQ 



XkHx J 



J 

Own \ 

-1 

j 



otherwise 

, E llH ,[m,t]>10 10 
otherwise 



(15b) 



where TO XkHz is the excitation level at threshold in quiet for a 1 kHz tone. From the 
equal loudness contours (FIGS. 7 and 8) TQ xkHx equals 4.2 dB. One notes that both of 
these specific loudness functions are equal to zero when the excitation is equal to the 
threshold in quiet. For excitations greater than the threshold in quiet, both functions 
15 grow monotonically with a power law hi accordance with Stevens' law of intensity 
sensation. The exponent for the nairowband function is chosen to be larger than that 
of the wideband function, making the nairowband function increase more rapidly 
than the wideband function. The specific selection of exponents p and gains G for 

the nairowband and wideband cases and are discussed below. 

20 Loudness 122 

Loudness 122 uses the banded specific loudness of Specific Loudness 120 to 
create a single loudness measure for the audio signal, namely an output at terminal 
123 that is a loudness value in perceptual units. Hie loudness measure may have 
arbitrary units, as long the comparison of loudness values for different audio signals 

25 indicates which is louder and which is softer. 
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Hie total loudness expressed in units of sone may be computed as the sum of 
the specific loudness for all frequency bands: 



M 

S[t] = A2,N'[m,t] , 



(16) 



m=l 



where A is the ERB spacing specified in Equation 6b. The parameters G m and fi m 
5 in Equation 15a are chosen so that when a[m,t] = 1, a plot of S in sone versus SPL for 
a 1 kHz tone substantially matches the corresponding experimental data presented by 
Zwicker (the circles in FIG. 9) (Zwicker, H. Fasti, "Psychoacoustics - Facts and 
Models," supra). The parameters G im and p m in Equation 15b are chosen so that 
when a\m, t] = 0 , a plot of N hi sone versus SPL for uniform exciting noise (noise with 
10 equal power in each ERB) substantially matches the corresponding results from 
Zwicker (the squares hi FIG. 9). A least squares fit to Zwicker' s data yields: 



G m = 0.0404 



Pm = 0-279 



GfVB = 005 8 



P„ =0.212 



(17a) 
(17b) 

(17c) 
(17d) 



FIG. 9 (solid lines) shows plots of loudness for both uniform-exciting noise 



and a 1 kHz tone. 

Specific Loudness Control 124 
As previously mentioned, two models of specific loudness are used in a 

20 practical embodiment (Equations 15a and 15b), one for narrowband and one for 
wideband signals. Specific Loudness Control 124 in the side path calculates a 
measure, a[m,t], of the degree to which the input signal is either narrowband or 
wideband in each band. In a general sense, a[m,t] should equal one when the signal 
is narrowband near the center frequency /J)"] of a band and zero when the signal is 

25 wideband near the center frequency /Jm] of a band. The control should vary 

continuously between the two extremes for varying mixtures of such features. As a 
simplification, die control a[m,t] may be chosen as constant across the bands, in 
which case a[m,t] is subsequently referred to as a[f\, omitting the band index m. The 
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control a[t] then represents a measure of how narrowband the signal is across all 
bands. Although a suitable method for generating such a control is described next, 
the particular method is not critical and other suitable mediods may be employed. 

The control a[t] may be computed from the excitation E[m 9 i\ at the output of 
5 Filterbank 104 rather than through some other processing of the signal x[ri\. E[m 9 t\ 
may provide an adequate reference from which the "narrowbandedness" and 
"widebandedness" of x[ji\ is measured, and as a result, a[t] may be generated with 

little added computation. 

"Spectral flatness" is a feature of E[m 9 t\ from which a[t] may be computed. 

10 Spectral flatness, as defined by Jayant and Noll (N. S. Jayant, P. Noll, Digital Coding 
Of Waveforms, Prentice Hall, New Jersey, 1984), is the ratio of the geometric mean 
to the arithmetic mean, where the mean is taken across frequency (index m in the 
case of E[m 9 t\). When E[m,t] is constant across w, the geometric mean is equal to the 
arithmetic mean, and the spectral flatness equals one. This corresponds to the 

15 wideband case. If E[m 9 i\ varies significantly across m, then the geometric mean is 
significantly smaller than the arithmetic mean, and the spectral flatness approaches 
zero. This corresponds to the narrowband case. By computing one minus the 
spectral flatness, one may generate a measure of "narrowbandedness," where zero 
corresponds to wideband and one to narrowband. Specifically, one may compute one 

20 minus a modified spectral flatness of E[m 9 t\: 



NB[t] = 1 ^ ! 1 J tfm , (18) 

1 ^ 3 E[m,t] 

\P[mf 

where P[m] is equal to the fr equency response of the transmission filter P(z) sampled 
at frequency a> = 27f c [m]/f s . Nonnalization of E[m 9 t] by the transmission filter may 
provide better results because application of the tr ansmission filter introduces a 
25 "bump" in E[m 9 t\ that tends to inflate the "narrowbandedness" measure. 

Additionally, computing the spectral flatness over a subset of the bands of E[m 9 i] 
may yield better results. The lower and upper limits of summation in Equation 1 8, 
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Mtlt] and MJt] , define a region that may be smaller than the range of allMbands. It 

is desired that M&f] and M„[t] include the portion of E[m 9 i] that contains the majority 

of its energy, and that the range defined by Mj[t] and M„[t] be no more than 24 units 

wide on the ERB scale. More specifically (and recalling that f c [m] is the center 

5 frequency of band m in Hz), one desires: 

HzToERB(f c {M u [t]]) - HzToERB(f c [M t [q]) s 24 (19a) 

and one requires: 

HzToERB(f c [M u [t]]) > CT[t] > HzToERB(f c [M t [t ]]) (19b) 

HzToERB(f c [M t [t ]]) > HzToERB(f c [l]) (19c) 

10 HzToERB(f e [M u [t]]) < HzToERB {J c [M]) , (19d) 

where CT[t] is the spectral centroid of E[m 9 t] measured on the ERB scale: 



M 



£ HzToERB(f e [m])E[m, t] 
CT[t) = Ti , ( 1 9e) 



Ideally, die limits of summation, M t [t] and M u [t] , are centered around CT[t] 
when measured on the ERB scale, but this is not always possible when CT[t] is near 
15 the lower or upper limits of its range. 

Next, NB[t] may be smoothed over time in a manner analogous to Equation 

11a: 

NB[t - 1] + -i- [NB[t] - NB[t - 1]) £ r] > 1 0 10 £#[/H 9 r] 



NB[t) = 



- 1], otherwise 
where a[t] is equal to the maximum of <Sfm,r], defined in Equation lib, over all m. 
20 Lastly, a[t] is computed from JVB[f] as follows: 

0, o{/VZ?|/]}< 0 
a[t] = -j <J>{jv§[f]} 0 < <&$NB[t]}<; 1 , (21a) 

1, ®ftrB[t]}^l 



where 



0>{x}=12.2568x 3 -22.8320X 2 +14.5869JC- 2.9594 (21b) 
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Although the exact form of <£{*} is not critical, the polynomial in Equation 21b may 
be found by optimizing a[t] against the subjectively measured loudness of a large 

variety of audio material. 

FIG. 2 shows a functional block diagram of an embodiment of a loudness 
5 measurer or loudness measuring process 200 according to a second aspect of the 
present invention. Devices or functions 202, 204, 206, 220, 222, 223 and 224 of 
FIG. 2 correspond to the respective devices or functions 102, 104, 106, 120, 122, 123 
and 124 of FIG. 1. 

According to the first aspect of the invention, of which FIG. 1 shows an 
10 embodiment, the loudness measurer or computation generates a loudness value hi 
perceptual units. In order to adjust the loudness of the input signal, a useful measure 
is a gain G[f], which when multiplied with the input signal x[n] (as, for example, in 
the embodiment of FIG. 3, described below), makes its loudness equal to a reference 
loudness level S ref . Hie reference loudness, S ref ^ may be specified arbitrarily or 

15 measured by another device or process operating in accordance with the first aspect 
of the invention from some "known" reference audio signal. Letting f 
represent all the computation performed on signal x[n] to generate loudness S[t] , one 
wants to find G[t] such that 

S ref =S[tl = *¥{G[t}x[n] 9 t (23) 

20 Because a portion of the processing embodied in *F{- is non-linear, no closed form 
solution for G[t] exists, so instead an iterative technique may be utilized to find an 
approximate solution. At each iteration / in the process, let G, represent the current 
estimate of G[t] . For eveiy iteration, G, is updated so that the absolute error from the 
reference loudness decreases: 

25 \S„ f - Y{G,x[»], t \ < \S ref - ¥{G M *[/a], t\ (24) 

There exist many suitable techniques for updating G, in order to achieve the above 
decrease in error. One such method is gradient descent (see Nonlinear Programming 
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by Dimitri P. Bertseakas, Athena Scientific, Belmont, MA 1995) in which G, is 
updated by an amount proportional to the enor at the previous iteration: 

G, = G M +//fc re/ - Y{G w 4»L'}), (25) 
where // is the step size of the iteration. Hie above iteration continues until the 
5 absolute error is below some threshold, until the number of iterations has reached 
some predefined maximum limit, or until a specified time has passed. At that point 
G[t] is set equal to G, . 

Referring back to Equations 6-8, one notes that the excitation of the signal x[ii] 
is obtained through linear operations on the square of the signal's STDFT magnitude, 

10 pr[Jfc,f]| 2 . It follows that the excitation resulting from a gain-modified signal Gx[n] is 
equal to the excitation of x[n] multiplied by G 2 . Furthermore, the temporal 
integration required to estimate long-term perceived loudness may be performed 
through lineai* time-averaging of the excitation, and therefore the time-averaged 
excitation corresponding to Gx[n] is equal to the time-averaged excitation of x[n] 

15 multiplied by G 2 . As a result, the time averaging need not be recomputed over the 
entire input signal history for eveiy re-evaluation of *F{G,x[n],r in the iterative 
process described above. Instead, the time-averaged excitation E[m 9 t] maybe 
computed only once from jc[/?], and, in the iteration, updated values of loudness may 
be computed by applying the square of the updated gain directly to E[m 9 t] . 

20 Specifically, letting X ¥ E $E[m,t] represent all the processing performed on the tune 
averaged excitation E[m 9 t] to generate S[t], the following relationship holds for a 

general multiplicative gain G: 

X ¥ E fc 2 E[m 9 t]}= *¥{Gx[nlt} (26) 
Using this relationship, the iterative process may be simplified by replacing 
25 *F{G,Jc[n],f} with T E fcfE[m 9 t] . This simplification would not be possible had the 

temporal integration required to estimate long-term perceived loudness been 
performed after the non-linear transformation to specific loudness. 

The iterative process for computing G[t] is depicted in FIG.2. The output 
loudness S[t) at tenninal 223 may be subtracted in a subtractive combiner or 



WO 2004/111994 PCT/US2004/016964 

24 

combining function 231 from reference loudness S ref at terminal 230. The resulting 
error signal 232 is fed into an iterative gain updater or updating function ("Iterative 
Gain Update") 233 that generates the next gain G, in the iteration. The square of this 
gain, Gf , is then fed back at output 234 to multiplicative combiner 208 where G? is 
5 multiplied with the time-averaged excitation signal from block 206. The next value 
of S[t] in the iteration is then computed from this gain-modified version of the time- 
averaged excitation through blocks 220 and 222. The described loop iterates until 
the termination conditions are met at which time the gain G[t] at terminal 235 is set 
equal to the current value of G, . The final value G[t] may be computed through the 

10 described iterative process, for example, for eveiy FFT frame / or just once at the end 
of an audio segment after the excitation has been averaged over the entire length of 
this segment. 

If one wishes to compute the non-gain-modified signal loudness in conjunction 
with this iterative process, the gain G, can be initialize to one at the beginning of 
15 each iterative process for each tune period t. This way, the first value of S[t] 

computed in the loop represents the original signal loudness and can be recorded as 
such. If one does not wish to record this value, however, G, may be initialized with 

any value. In the case when G[t] is computed over consecutive time frames and one 
does not wish to record the original signal loudness, it may be desirable to initialize 
20 G i equal to the value of G[t] from the previous tune period. This way, if the signal 

has not changed significantly from the previous time period, it likely that the value 
G[t] will have remained substantially the same. Therefore, only a few iterations will 

be required to converge to the proper value. 

Once the iterations are complete, G[t] represents the gain to be applied to the 
25 input audio signal at 201 by some external device such that the loudness of the 
modified signal matches the reference loudness. FIG. 3 shows one suitable 
arrangement in which the gain G[t] from the Iterative Gain Update 233 is applied to a 

control input of a signal level controlling device or function such as a voltage 
controlled amplifier (VGA) 236 in order to provide a gain adjusted output signal. 
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VCA 234 in FIG. 3 may be replaced by a human operator controlling a gain adjuster 
in response to a sensoiy indication of the gain GftJ on line 235. A sensoiy indication 
may be provided by a meter, for example. The gain G[r] may be subject to time 

smoothing (not shown). 
5 For some signals, an alternative to the smoothing described in Equations 10 

r 

and 11 may be desirable for computing the long-term perceived loudness. Listeners 
tend to associate the long-term loudness of a signal with the loudest portions of that 
signal. As a result, the smoothing presented in Equations 10 and 1 1 may 
underestimate the perceived loudness of a signal containing long periods of relative 
10 silence interrupted by shorter segments of louder material. Such signals are often 
found hi film sound tr acks with short segments of dialog surrounded by longer 
periods of ambient scene noise. Even with the thresholding presented in Equation 
1 1, the quiet portions of such signals may contribute too heavily to the tune-averaged 
excitation E[m 9 t] . 

15 To deal with this problem, a statistical technique for computing the long-temi 

loudness may be employed in a further aspect of the present invention. First, the 
smoothing time constant in Equations 10 and 11 is made veiy small and tdB is set to 
minus infinity so that E[m,t] represents the "instantaneous" excitation. In this case, 
the smoothing parameter X m may be chosen to vary across the bands m to more 

20 accurately model the maimer in which perception of instantaneous loudness varies 
across frequency. In practice, however, choosing A m to be constant across m still 
yields acceptable results. The remainder of the previously described algorithm 
operates unchanged resulting in an instantaneous loudness signal S[t] , as specified in 
Equation 16. Over some range t x <t<t 2 the long-term loudness S p [t i9 t 2 ] is then 

25 defined as a value which is greater than S[t] for p percent of the time values in the 
range and less than SI/] for 100-/? percent of the time values in the range. 
Experiments have shown that setting p equal to roughly 90% matches subjectively 
perceived long-term loudness. With this setting, only 10% of the values of S[t] need 
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be significant to affect the long-term loudness. The other 90% of the values can be 
relatively silent without lowering the long-term loudness measure. 

The value S p [t }9 t 2 ] can be computed by sorting in ascending order the values 

S[t] , r, < t < t 2 , into a list S sort {/} , 0 < / < t 2 - t x , where i represents the /th element of tiie 
5 sorted list. The long-term loudness is then given by the element that is p percent of 
the way into the list: 

S p [t u t 2 ] = S son {round{p(t 2 -r^/lOO)} (27) 

By itself, the above computation is relatively straightforward. However, if one 
wishes to compute a gain G p [t l9 t 2 ] which when multiplied with x[n] results in 

10 S p [t u t 2 ] being equal to some reference loudness S ref , the computation becomes 

significantly more complex. As before, an iterative approach is required, but now the 
long-term loudness measure S p [t } ,t 2 ] is dependent on the entire range of values S[t] 9 

t x <t<t 2 , each of which must be updated with each update of G,. in the iteration. In 

order to compute these updates, the signal E[m,t] must be stored over the entire range 

15 r, < t < t 2 . In addition, since the dependence of S[t] on G t is non-linear, the relative 
ordering of S[t] , r, < / < t 2 , may change with each iteration, and therefore S sori {/} must 
also be recomputed. The need for re-sorting is readily evident when considering 
short-time signal segments whose spectrum is just below the threshold of hearing for 
a particular gain in the iteration. When the gain is increased, a significant portion of 

20 the segment's spectrum may become audible, which may make the total loudness of 
the segment greater than other narrowband segments of the signal which were 
previously audible. When the range t x <t<t 2 becomes large or if one desires to 

compute the gain G p [t l 9 t 2 ] continuously as a function of a sliding time window, die 

computational and memory costs of this iterative process may become prohibitive. 
25 A significant savings hi computation and memory is achieved by realizing that 

S[t] is a monotonically increasing function of G, . In other words, increasing G, 

always increases the short-term loudness at each tune instant. With this knowledge, 
the desired matching gain G p [t x ,t 2 ] can be efficiently computed as follows. First, 
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compute the previously defined matching gainG[/] from E[m,t] using the described 
iteration for all values of t in the range < t < t 2 . Note that for each value t, G[t] is 
computed by iterating on the single value E[m,t] . Next, tire long term matching gain 
G [f, ,t 2 ] is computed by sorting into ascending order the values G[t] , t x <,t^t 2 , into a 

5 list G sorl {/} , 0 < i < t 2 - t x , and then setting 

G„ [t t ,t 2 ] = G sorl {round ((1 00 - P) (t 2 - 1, ) / 1 00)} . (2 8) 

We now argue that G p [t x ,t 2 ] is equal to the gain which when multiplied with x[n] 
results in S p [t x ,t 2 ] being equal to the desired reference loudness S re/ . Note from 
Equation 28 tliat G[t] < G p [t x ,t 2 ] for 100-p percent of the time values in the range 
10 t x < t < t 2 and that G[r] > G p [t t ,t 2 ] for the other p percent. For those values of G[t\ such 
that G[t] <G p [t ] ,t 2 ], one notes that if G p [t t ,t 2 ] were to be applied to the corresponding 



values of E[m,t] rather than G[t], then the resulting values of S[t] would be greater 
than the desired reference loudness. This is hue because S[t] is a monotonically 
increasing function of the gain. Similarly, if G,[f„f a ] were to be applied to the 
15 values of E[m,t] conesponding to G[t] such that G[/] > G p [t„t 2 ] , the resulting values 
of S[t] would be less than the desired reference loudness. Therefore, application of 
G p [t x ,t 2 \ to all values of E[m,t) in the range r, <. t<, t 2 results in S[t] being greater than 

the desired reference 100-p percent of the time and less than the reference p percent 
of the tune, hi other words, S p [t ls t 2 ] equals the desired reference. 

20 This alternate method of computing the matching gain obviates the need to 

store E[m,t] and S[t] over the range t t < t < t 2 . Only G[t] need be stored. In addition, 
for eveiy value of G p [t, ,t 2 ] that is computed, the sorting of G[t] over the range 
f, <>t<,t 2 need only be performed once, as opposed to the previous approach 
where S[t] needs to be re-sorted with eveiy iteration, hi the case where G p [t t ,t 2 ] is to 

25 be computed continuously over some length T sliding window (i. e., r, = t - T , t 2 =t), 
the list G mrt {/} can be maintained efficiently by simply removing and adding a single 
value from the sorted list for each new time instance. When the range t x ^t<>t 2 
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becomes extremely large (the length of entire song or film, for example), the memoiy 
required to store G[t] may still be prohibitive. In this case, G p fr,,r 2 ] may be 

approximated from a discretized histogram of G[t] . In practice, this histogram is 
created from G[r] in units of decibels. Tlie liistogram may be computed as 
5 H[i] = number of samples in the range r, <t<t 2 such that 

A* / + dB min < 201og I0 G[t] <A„ B (/ + 1) + dB mia (29) 
where A dB is the liistogram resolution and dB mm is die liistogram minimum. Tlie 

matching gain is then approximated as 

G p [t l9 t 2 ]=A dB i p +dB min (30a) 

10 where 

100^ = p. (30b) 

1=0 

and / is the maximum histogram index. Using the discretized histogram, only / 
values need be stored, and G p \t u t 2 ] is easily updated with each new value of G[t] . 

Other methods for approximating G p [t x ,t 2 \ from G[r] may be conceived, and 

15 this invention is intended to include such techniques. The key aspect of this portion 
of the invention is to perform some type of smoothing on the matching gain G[t] to 
generate the long term matching gain G p [t u t 2 ] rather than processing the 
instantaneous loudness S[t] to generate the long term loudness S p [t x ,t 2 ] from which 
G p [t ]9 t 2 ] is then estimated through an iterative process. 

20 Figures 10 and 1 1 display systems similar to Figures 2 and 3, respectively, but 

where smoothing (device or function 237) of the matching gain G[t] is used to 

generate a smoothed gain signal G p [t u t 2 ] (signal 238). 

The reference loudness at input 230 (FIGS. 2, 3, 10, 11) may be "fixed" or 
"variable" and the source of the reference loudness may be internal or external to an 
25 arrangement embodying aspects of the inventi on. For example, the reference 

loudness may be set by a user, in which case its source is external and it may remain 
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"fixed" for a period of time until it is re-set by the user. Alternatively, the reference 
loudness may be a measure of loudness of another audio source derived from a 
loudness measuring process or device according to the present invention, such as the 
arrangement shown in the example of FIG. 1 . 
5 The normal volume control of an audio-producing device may be replaced by a 

» 

process or device in accordance with aspects of the invention, such as the examples 
of FIG. 3 or FIG. 1 1. In that case, the user-operated volume knob, slider, etc. would 
control the reference loudness at 230 of FIG. 3 or FIG. 1 1 and, consequently, the 
audio-producing device would have a loudness commensurate with the user's 

10 adjustment of the volume control. 

An example of a variable reference is shown in FIG. 12 where the reference 
loudness S ref is replaced by a variable reference S ref [t ] that is computed, for example, 

from the loudness signal S[t] through a variable reference loudness device or 
function ("Variable Reference Loudness") 239. In this arrangement, at the beginning 
15 of each iteration for each time period /, the variable reference S ref [t] may be 

computed from the unmodified loudness S[t] before any gain has been applied to the 
excitation at 208. The dependence of S re/ [t] and S[t] through variable loudness 

reference function 239 may take various forms to achieve various effects. For 
example, the function may simply scale S[t] to generate a reference that is some 
20 fixed ratio of the original loudness. Alternatively, the function might produce a 
reference greater than S[t] when S[t] is below some threshold and less than S[t] 

i 

when S[t] is above some threshold, thus reducing the dynamic range of the perceived 
loudness of the audio. Whatever the form of this function, the previously described 
iteration is performed to compute G[t] such that 

25 Y £ f(J 2 [t]E[m,t]}= S re/ [t] (3 1) 

The matching gain G[t] may then be smoothed as described above or through some 
other suitable technique to achieve the desired perceptual effect. Finally, a delay 240 
between the audio signal 201 and VCA block 236 may be introduced to compensate 
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for any latency in the computation of the smoothed gam. Such a delay may also be 
provided in the arrangements of FIGS. 3 and 11. 

Hie gain control signal G[t] of the FIG. 3 arrangement and the smoothed gain 

control signal G p [t x s / 2 ] of the FIG. 1 1 arrangement may be useful in a variety of 

applications including, for example, broadcast television or satellite radio where the 
perceived loudness across different channels varies. In such environments, the 
apparatus or method of the present invention may compare the audio signal from 
each channel with a reference loudness level (or the loudness of a reference signal). 
An operator or an automated device may use the gain to adjust the loudness of each 
channel. All channels would thus have substantially the same perceived loudness. 
FIG. 13 shows an example of such an arrangement in which the audio from a 
plurality of television or audio channels, 1 through N, are applied to the respective 
inputs 201 of a processes or devices 250, 252, each being in accordance with aspects 
of the invention as shown in FIGS. 3 or 11. The same reference loudness level is 
applied to each of the processes or devices 250, 252 resulting in loudness-adjusted 1 st 
channel through Nth channel audio at each output 236. 

Hie measurement and gain adjustment technique may also be applied to a real- 
time measurement device that monitors input audio material, performs processing 
that identifies audio content primarily containing human speech signals, and 
computes a gain such that the speech signals substantially matches a previously 
defined reference level. Suitable techniques for identifying speech in audio material 
are set forth in U.S. Patent Application S.N. 10/233,073, filed August 30, 2002 and 
published as U.S. Patent Application Publication US 2004/0044525 Al, published 
March 4, 2004. Said application is hereby incorporated by reference in its entirety. 
Because audience annoyance with loud audio content tends to be focused on the 
speech portions of program material, a measurement and gain adjustment method 
may greatly reduce annoying level difference in audio commonly used in television, 
film and music material. 
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Implementation 

Hie invention may be implemented in hardware or software, or a combination 
of both {e.g., programmable logic arrays). Unless otherwise specified, the algorithms 
included as part of the invention are not inherently related to any particular computer 
5 or other apparatus. In particular, various general-purpose machines may be used 
with programs written in accordance with the teachings herein, or it may be more 
convenient to construct more specialized apparatus (e.g., integrated circuits) to 
perform the required method steps. Thus, the invention may be implemented in one 
or more computer programs executing on one or more programmable computer 

10 systems each comprising at least one processor, at least one data storage system 
(including volatile and non- volatile memory and/or storage elements), at least one 
input device or port, and at least one output device or port. Program code is applied 
to input data to perform the functions described herein and generate output 
information. The output information is applied to one or more output devices, in 

15 known fashion. 

Each such program may be implemented ha any desired computer language 
(including machine, assembly, or high level procedural, logical, or object oriented 
programming languages) to communicate with a computer system. In any case, the 
language may be a compiled or interpreted language. 

20 Each such computer program is preferably stored on or downloaded to a 

storage media or device (e.g., solid state memory or media, or magnetic or optical 
media) readable by a general or special purpose programmable computer, for 
configuring and operating the computer when the storage media or device is read by 
the computer system to perform the procedures described herein. The inventive 

25 system may also be considered to be implemented as a computer-readable storage 
medium, configured witii a computer program, where the storage medium so 
configured causes a computer system to operate in a specific and predefined manner 
to perform the functions described herein. 

A number of embodiments of the invention have been described. 

30 Nevertheless, it will be understood that various modifications may be made without 
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departing from the spirit and scope of the invention. For example, some of the steps 
described above may be order independent, and thus can be performed in an order 
different from that described. Accordingly, other embodiments are within the scope 
of the following claims. 



* 



10 
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Claims 

1. A method for processing an audio signal, comprising 
producing, in response to the audio signal, an excitation signal, and 
calculating the perceptual loudness of the audio signal in response to the 
excitation signal and a measure of characteristics of the audio signal, wherein said 
calculating selects, from a group of two or more specific loudness model functions, 
one or a combination of two or more of the specific loudness model functions, the 
selection of which is controlled by the measure of characteristics of the input audio 

signal. 



2. A method according to claim 1 wherein the measure of characteristics of 
the audio signal is a measure of the degree to which the input signal is narrowband or 
wideband. 



15 3. A method according to claim 2 further comprising calculating the degree to 

which the input signal is narrowband or wideband by calculating the spectral flatness 
of the input signal. 



4. A method according to claim 1 wherein said calculating selects from or 
20 combines two specific loudness model functions, a first loudness model function 
being selected by a measure of characteristics resulting from a narrowband input 
signal, a second loudness model function being selected by a measure of 
characteristics resulting from a wideband input signal, and a combination of the first 
and second loudness model functions being selected by a measure of characteristics 
25 resulting from a partly narrowband, partly wideband input signal. 



5. A method according to claim 4 wherein both the first and second loudness 
model functions increase monotonically above a threshold in quiet with increasing 
excitation according to a power law, the first loudness model function increasing 
30 faster than the second loudness model function. 
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6. A method according to claim 1 wherein said calculating selects from a 
group of two or more specific loudness models, one or a combination of two or more 
of said specific loudness models in each of respective frequency bands of the 
excitation signal. 

7. A method according to claim 1 wherein said calculating selects from a 
group of two or more specific loudness models, one or a combination of two or more 
of said specific loudness models in a group of respective frequency bands of the 
excitation signal. 

8. A method according to claim 7 wherein the group of respective frequency 
bands are all of the frequency bands of the excitation signal. 

■ 

9. A method according to claim 1 wherein the measure of characteristics of the 
audio signal is derived from the excitation signal. 

10. A method according to claim 1 wherein the calculating includes 
calculating a specific loudness in each of respective frequency bands of the excitation 
signal. 

11. A method according to claim 10 wherein the calculating further comprises 
selecting the specific loudness of a frequency band to provide the perceptual 
loudness or combining die specific loudness of a group of frequency bands to provide 
the perceptual loudness. 

12. A method for processing an audio signal, comprising 
producing, in response to the audio signal, an excitation signal, and 
calculating, in response at least to the excitation signal, a gain value G[t], 

which, if applied to the audio signal, would result in a perceived loudness 
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substantially the same as a reference loudness, the calculating including an iterative 
processing loop that includes at least one non-linear process. 

13. The method of claim 12 wherein the iterative processing loop includes 
5 calculating a perceptual loudness. 

14. The method of claim 12 wherein said calculating is also in response to a 
measure of characteristics of the audio signal. 

15. The method of claim 14 wherein said at least one non-linear process 
includes a specific loudness calculation that selects, from a group of two or more 
specific loudness model functions, one or a combination of two or more of said 
specific loudness model functions, the selection of which is controlled by the 
measure of characteristics of the input audio signal. 

16. Hie method of claim 12 wherein the excitation signal is time smoothed 
and/or the method further comprises time smoothing the gain value G[t]. 

* * 

17. The method of claim 16 wherein the excitation signal is linearly time 
smoothed. 

18. The method of claim 16 wherein the method further comprises smoothing 

* 

the gain value G[t], said smoothing employing a histogram technique. 

25 19. Hie method of claim 12 wherein the iterative processing loop includes 

time smoothing. 

20. A method according to any one of claims 12 to 19, wherein the iterative 
processing loop includes 



■ 



WO 2004/111994 



36 



PCT/US2004/016964 



adjusting the magnitude of the excitation signal in response to a function of an 
iteration gain value G, such that the adjusted magnitude of the excitation signal 
increases with increasing values of G, and decreases with decreasing values of G„ 

calculating a perceptual loudness in response to the magnitude-adjusted 

5 excitation signal, 

comparing the calculated perceptual loudness of the audio signal to a reference 

perceptual loudness to generate a difference, and 

adjusting the gain value G, in response to the difference so as to reduce the 
difference between the calculated perceptual loudness and the reference perceptual 
10 loudness. 

21. A method according to claim 20, wherein the iterative processing loop, in 
accordance with a minimization algorithm, repetitively adjusts the magnitude of the 
excitation signal, calculates a perceptual loudness, compares the calculated 

15 perceptual loudness to a reference perceptual loudness, and adjusts the gain value G, 
to a final value G[t]. 

22. A method according to claim 21, wherein the minimization algorithm is in 
accordance with the gradient descent method of minimization. 

20 

23. A method according to any one of claims 12 to 22, further comprising 
controlling the amplitude of the input audio signal with the gain G[t] so that 

the resulting perceived loudness of the input audio signal is substantially the same as 
the reference loudness. 

25 

24. A method according to any one of claims 12 to 23 wherein the reference 
loudness is set by a user. 

25. A method according to any one of claims 12 to 23 wherein the reference 

* 

30 loudness is a perceptual loudness calculated by a process according to claim 13. 
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26. A method according to claim 1 or claim 12 wherein producing, in 
response to the audio signal, an excitation signal, comprises 

linearly filtering the audio signal by a function or functions that simulate the 
characteristics of the outer and middle human ear to produce a linearly-filtered audio 
5 signal, and 

4 

dividing the linearly-filtered audio signal into frequency bands that simulate 
Hie excitation pattern generated along the basilar membrane of the inner ear to 
produce the excitation signal. 

10 27. A method according to claim 12 wherein said at least one non-linear 

process includes calculating the specific loudness in each frequency band of the 
excitation signal. 

28. A method according to claim 27 wherein said calculating the specific 
15 loudness in each frequency band of the excitation signal selects from a group of two 
or more specific loudness model functions, one or a combination of two or more of 
said specific loudness model functions, the selection of which is controlled by the 
measure of characteristics of the input audio signal. 

20 29. A method according to claim 20 wherein calculating a perceptual loudness 

in response to the magnitude-adjusted excitation signal includes calculating the 
specific loudness in respective frequency bands of the excitation signal. 

30. A method according to claim 29 wherein said calculating the specific 
25 loudness in each frequency band of the excitation signal selects, from a group of two 
or more specific loudness model functions, one or a combination of two or more of 
the specific loudness model functions, the selection of which is controlled by the 
measure of characteristics of the input audio signal. 
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31. A method according to claim 30 wherein calculating a perceptual loudness 
in response to the magnitude-adjusted excitation signal further comprises 

combining the specific loudness for each frequency band into a measure of 
perceptual loudness. 

32. A method according to any one of claims 13, 20 21 and 23 wherein the 
reference perceptual loudness is derived from a measure of the calculated perceptual 
loudness. 



10 33. A method according to claim 32 wherein the reference perceptual 

loudness is a scaled version of the calculated perceptual loudness. 

34. A method according to claim 32 wherein the reference perceptual 
loudness is greater than the calculated perceptual loudness when the calculated 
15 perceptual loudness is below a threshold and less than the calculated perceptual 
loudness when the calculated perceptual loudness is above a threshold. 



35. A method for processing a plurality of audio signals, comprising 

a plurality of processes, each receiving a respective one of the audio signals, 

20 wherein each process 

produces, in response to the respective audio signal, an excitation 

signal, 

calculates, in response at least to the excitation signal, a gain 
value G[t], which, if applied to the audio signal, would result in a 
25 perceived loudness substantially the same as a reference loudness, the 

calculating including an iterative processing loop that includes at least 
one non-linear process, and 

controls the amplitude of the respective audio signal with the gain 
G[t] so that the resulting perceived loudness of the respective audio 
30 signal is substantially the same as the reference loudness, and 
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applying the same reference loudness to each of the plurality of processes. 



36. Apparatus adapted to perform the methods of any one of claims 1 through 

35. 



37. A computer program, stored on a computer-readable medium for causing a 
computer to perform the methods of any one of claims 1 through 35. 
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FIG. 2 
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FIG. 3 
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FIG. 4 1 
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FIG. 6 
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FIG. 10 
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FIG. 11 
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FIG. 12 
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FIG. 13 
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3. 



Claims Nos.: 

because they are dependent claims and are not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box III Observations where unity of invention is lacking (Continuation of item 3 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 



see additional sheet 



1. 



As all required additional search fees were timely paid by the applicant, this International Search Report covers all 
searchable claims. 



2. | | As all searchable claims could be searched without effort justifying an additional fee, this Authority did not invite payment 
of any additional fee. 



3. 



As only some of the required additional search fees were timely paid by the applicant, this International Search Report 
covers only those claims for which fees were paid, specifically claims Nos.: 



x 



No required additional search fees were timely paid by the applicant. Consequently, this International Search Report is 
restricted to the invention first mentioned in the claims; it is covered by claims Nos.: 



1-11, 26, 36, 37 



Remark on Protest 



The additional search fees were accompanied by the applicant's protest. 



No protest accompanied the payment of additional search fees. 
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This International Searching Authority found multiple (groups of) 
inventions in this international application, as follows: 

1. claims: 1-11, 26, 36, 37 

Claim 26 is considered as far as it is dependent on claim 1; 
claims 36 and 37 are considered as far as these are 
dependent on any of claims 1-11. 

These claims deal with strategies for calculating the 
perceptual loudness of an audio signal. 



2. claims: 12-37 

(as far as claims 26, 36 and 37 are not referring to any of 
claims 1-11) : 

These claims deal with calculating a gain value to be 
applied to an audio signal in order to obtain a 
predetermined perceived loudness. 
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Box No. VIII 



1 . This opinion contains indications relating to the following items: 

Basis of the opinion 
Priority 

Non-establishment of opinion with regard to novelty, inventive step and industrial applicability 
Lack of unity of invention 

Reasoned statement under Rule 43b/s.1 (a)(i) with regard to novelty, inventive step or industrial 
applicability; citations and explanations supporting such statement 

Certain documents cited 

Certain defects in the international application 
Certain observations on the international application 

2. FURTHER ACTION 

If a demand for international preliminary examination is made, this opinion will usually be considered to be a 
written opinion of the International Preliminary Examining Authority ("IPEA"). However, this does not apply where 
the applicant chooses an Authority other than this one to be the IPEA and the chosen IPEA has notifed the 
International Bureau under Rule 66.1b/s(b) that written opinions of this International Searching Authority 
will not be so considered. 

If this opinion is, as provided above, considered to be a written opinion of the IPEA, the applicant is invited to 
submit to the IPEA a written reply together, where appropriate, with amendments, before the expiration of three 
months from the date of mailing of Form PCT/ISA/220 or before the expiration of 22 months from the priority date, 
whichever expires later. 

For further options, see Form PCTASA/220. 

3. For further details, see notes to Form PCT/iSA/220. 
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Box No. I Basis of the opinion 

1 With regard to the language, this opinion has been established on the basis of the international application in 
the language in which it was filed, unless otherwise indicated under this item. 

□ This opinion has been established on the basis of a translation from the original language into the following 
language , which is the language of a translation furnished for the purposes of international search 
(under Rules 12.3 and 23.1(b)). 

2 With regard to any nucleotide and/or amino acid sequence disclosed in the international application and 
necessary to the claimed invention, this opinion has been established on the basis of: 

a. type of material: 

□ a sequence listing 

□ table(s) related to the sequence listing 

b. format of material: 

□ in written format 

□ in computer readable form 

c. time of filing/furnishing: 

□ contained in the international application as filed. 

□ filed together with the international application in computer readable form. 

□ furnished subsequently to this Authority for the purposes of search. 

3 □ In addition, in the case that more than one version or copy of a sequence listing and/br table relating thereto 

has been filed or furnished, the required statements that the information in the subsequent or additional 
copies is identical to that in the application as filed or does not go beyond the application as filed, as 
appropriate, were furnished. 

4. Additional comments: 
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Box No. II Priority 

1 . El The following document has not been furnished: 

IS copy of the earlier application whose priority has been claimed (Rule 43£>/s.1 and 66.7(a)). 

□ translation of the earlier application whose priority has been claimed (Rule 43bisA and 66.7(b)). 

Consequently it has not been possible to consider the validity of the priority claim. This opinion has 
nevertheless been established on the assumption that the relevant date is the claimed priority date. 

2 □ This opinion has been established as if no priority had been claimed due to the fact that the priority claim 
has been found invalid (Rules 43 bis.l and 64.1). Thus for the purposes of this opinion, the international 
filing date indicated above is considered to be the relevant date. 

3. □ The International Searching Authority has not been able to consider the validity of the priority claim because 

a copy of the earlier application whose priority has been claimed was not available to the International 
Searching Authority at the time that the search was conducted (Rule 17.1). This opinion has nevertheless 
been established on the assumption that the relevant date is the claimed priority date. 

4. Additional observations, if necessary: 



Box No. V Reasoned statement under Rule 43fc/s.1(a)(i) with regard to novelty, inventive step or 
industrial applicability; citations and explanations supporting such statement 

1. Statement 



Novelty (N) 


Yes: 


Claims 


1 


-11, 


26, 36, 37 




No: 


Claims 








Inventive step (IS) 


Yes: 


Claims 


1 


-11, 


26, 36, 37 




No: 


Claims 








Industrial applicability (IA) 


Yes: 


Claims 


1 


-11, 


26, 36, 37 




No: 


Claims 









2. Citations and explanations 
see separate sheet 



Box No . V1H Certain observations on the international application 

The following observations on the clarity of the claims, description, and drawings or on the question whether the 
claims are fully supported by the description, are made: 

see separate sheet 



Form PCTrtSA/ 237 (January 2004) 



WRITTEN OPINION OF THE 
INTERNATIONAL SEARCHING 
AUTHORITY (SEPARATE SHEET) 



International application No. 
PCT/US2004/01 6964 



V. Reasoned statement under R. 43bis.1 (a)(1) 

1 . In this section only a statement for claims 1 -1 1 and for claims 26, 36 and 37, as 
far as these are referring to any of claims 1-1 1 , is given. 

The application deals with loudness measurements of audio signals. 

The closest prior art is disclosed in International Search Report Document D1 
(More et. al.: "A Model for the Prediction of Thresholds, Loudness, and Partial 
Loudness", J. Audio Eng. Soc, Vol. 45, No. 4, April 1997, pp. 224-240). This 
document proposes an equation for calculating the specific loudness based on the 
excitation in a frequency band). 

Starting from D1 the present application seeks to refine the calculation of the 
perceived loudness. 

In order to achieve this, a measure of characteristics of the audio signal itself is 
used to control the selection of one or a combination of loudness model functions 
out of an available plurality of models. 

This is claimed in present independent method claim 1 . 

Since no disclosure of, nor even a vague hint towards controlling the model 
selected for the specific loudness calculation has been found in the prior art, the 
independent method claim 1 appears to fulfill the requirements of novelty and 
inventive step as set out in Art. 33(2) and (3) PCT. 

VIII. Certain observations on the international application 

2. In present dependent claims 2-4 the application makes use of the wordings 
"narrowband" and "wideband" in a different sense compared to the usual meaning 
of these terms in the field of audio coding. In fact, rather than dealing with the 
overall bandwidth of the audio signal, "narrowband" and "wideband" are used here 
in the context of each frequency band considered for calculating the perceptual 
loudness of the audio signal (see the description, p. 19, I. 20 - p. 20, I. 1 9). 

In order to overcome the resulting lack of clarity (Art. 6 PCT) of the claims, these 
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claims should be accordingly amended, 



Form PCT/Separate Sheet/237 {Sheet 2) (EPO-January 2004) 



